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Abstract — Traffic forecasting from past observed traffic 
data with small calculation complexity is one of important 
problems far planning of servers and networks. Focusing 
on World Wide Web (WWW) traffic as fundamental inves- 
tigation, this paper would deal with Bayesian forecasting 
of network traffic on the time varying Poisson model from 
a viewpoint from statistical decision theory. Under this 
model, we would show that the estimated forecasting value 
is obtained by simple arithmetic calculation and expresses 
real WWW traffic well from both theoretical and empirical 
points of view. 

Keywords: World Wide Web (WWW) traffic, traffic engineering, 
statistical decision theory, time varying Poisson distribution, long- 
range dependence (LRD) 

1. Introduction 

Under network environment such as Internet, planning of 
servers and networks is one of important problems for stable 
operation. It is often typical situation that administrators 
analyze logs on their servers and networks. They may 
frequently look into result of log analysis software where 
these tools usually have some functions to periodically sum- 
marize logs. For example, webalizer [1] and analog [9] etc. 
have been widely used among World Wide Web (WWW) 
server administrators or users for long years. These tools 
usually summarize the logs by counting hourly, daily, and 
monthly numbers of hits, files, and pages etc. Administrators 
would often make their operation plans with combination of 
their experience and intuition from these logs. In this case, 
traffic forecasting rule is not clearly formulated and those 
summarized logs remain in the field of descriptive statistics 
from the statistical point of view. 

On the other hand, researchers in the field of traffic 
engineering have been suggesting a lot of analysis models. 
Probabilistic approach is one of viewpoints in this field. It 
is wide-spread fact that the stationary Poisson distribution is 
not always suitable for Internet traffic because of its nature 
of non-stationality [6] [4] and long-range dependence (LRD) 
[5] [6] etc. Therefore desirable conditions of good traffic 
models are to have structures to express such nature at least. 
Furthermore, another requirement of models is to have a 
structure of traffic forecasting. For this point, parameter 
estimation is often performed at first under assumption of the 



stationarity [7], then the estimated parameter is substituted 
for the parameter of model. This approach has been wide- 
spread in the field of inferential statistics from the statistical 
point of view. 

However, substituting the estimated parameter as a con- 
stant for the model's parameter is not always suitable es- 
pecially on forecasting problems. This is because there is 
often no guarantee that the assumptions under the parameter 
estimation of the model always hold for future unknown 
data set. Bayesian approach [2] [3] is one of alternatives for 
this point. In Bayesian approach, a probability distribution 
of parameter is assumed as the prior distribution. If new 
data is observed, then the Bayes theorem updates the prior 
distribution of parameter to the posterior distribution and 
then forecasts the posterior distribution of data. Recently, 
this approach has been widely applied to many forecasting 
problems especially in the field of information technologies 
and bioinformatics etc. In order to take Bayesian approach, 
statistical decision theory is an important theoretical frame- 
work from the statistical point of view. 

Taking the above factors into account, this paper would 
deal with Bayesian forecasting of WWW traffic on the non- 
stationary i.e. time varying Poisson model. Bayesian fore- 
casting on time varying parameter model has been proposed 
in [8] by defining certain class of parameter transformation 
function. However, it has not yet been discussed about any 
predictive estimator nor definite transformation function of 
parameter [8]. This paper would clearly define a random- 
walking type of transformation function of parameter to 
obtain the Bayes optimal prediction for WWW traffic. Then 
its effectiveness would be evaluated with real WWW traffic 
data. In this model, time varying degree is caught by a real 
valued constant fc (0 < /c < 1) and this constant would 
play an important role throughout this paper Another feature 
is that the traffic forecasting value is obtained by simple 
arithmetic calculations under known k. In general, the Bayes 
theorem often results in large calculation costs. However, 
certain combination of parameter distribution and its trans- 
formation function solves this problem. We believe that this 
point can be helpful not only for theoretical calculation cost 
but also for real implementation on WWW log analysis tools 
[1] [9]. 

The rest of this paper is organized as the foUowings. 
Section 2 gives some definitions and explanations of the 



forecasting model with time varying Poisson distribution. 
Section 3 shows some analysis examples of real WWW 
traffic data to validate this paper's approach and Section 
4 gives their discussions. Finally, Section 5 concludes this 
paper. 

2. The Time Varying Poisson Model 

2.1 Definitions 

Suppose X is a discrete random variable and p{x) = 
Pr{X = x} is the probability distribution where real 
number x is each element of space of X. Throughout 
this paper, the probabiUty distribution of X is assumed to 
depend on real valued parameter 9 <= Q. The true parameter 
9* G Q is unknown, however, the probability distribution 
of parameter p{9) is assumed to be known. Hereafter the 
probability distribution of X under parameter 9 is simply 
denoted as p{x I 9). 



For 9t+i,9t > 0, 
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Fig. 1: Overview of the inferential process. 

Let t = 1, 2, • • • be discrete time and a:* = 0, 1, • • • be 
number of WWW request arrivals at time t, respectively. 
This paper focuses on xt for traffic analysis by assuming 
probability distribution p{xt \ 9t) where 9t > is a 
time varying density parameter at time t. The time varying 
Poisson model takes sequence of = xiX2 - --Xt as input 
and calculates Xt+i as an output estimator where the prior 
distribution of parameter p{9t) and time variation rule of 
9t are known. The overview of the inferential process is 
depicted in Figure 1. 

In Figure 1, a;t is assumed to be the Poisson distribution 
with a time varying density parameter 9t as follows: 

For = 0, 1, 2, • • • , 



P{xt \dt)= 



exp {-9t) 



Xt\ 



(1) 



where > is a time varying density parameter. 

For parameter 9t, the following time varying model is 
assumed: 



(2) 



where fc is a constant such that < fc < 1, and < < 
1 is a continuous random variable which is conditionally 
independent from 9t. (2) represents a transformation of 
9t+i from both 9t and Ut under a known constant k. This 
transformation is regarded as a kind of random-walk. 

Furthermore, the initial random variables of 9i and ui 
are from the Gamma and Beta distributions, respectively. 
The followings give their definitions: 

For 9i > 0, 



p{9i ai,Pi) 



r(ai) 



exp(-ei/3i)(ei)("^)-\ (3) 



where ai > 0, (3i > are parameters of the Gamma 
distribution. 

In (3), r (x) is the Gamma function defined below: 



/>oo 

r(a;)= / y''-^exp{-y)dy, 
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(4) 



where a; > 0. 
For < ui < 1, 

p (ui|fcai, (1 — k) ai) 
r(ai) 
r(A;ai)r[(l-fc)ai] 



(5) 



where kai > 0, (1 — fc) ai > are parameters of the Beta 
distribution. 

In (3) and (5), two random variables 9i and ui have a real 
value ai > as their parameters in common. This means 
that 6*1 and ui are conditionally independent. 

Remarks 2.1: In (2), a constant < fc < 1 expresses 
time varying degree of 6't. If A: = 1, (2) simply becomes 
9t+i = Ut9t. In this case, 9t+i does not vary since the 
variance of Ut , which equals to — fc)/ (at + 1) according 
to the nature of Beta distribution, becomes zero. This means 
that the Poisson distribution of Xt in (1) is stationary. 

If fc < 1, on the other hand, 9t+i varies depending on the 
previous 9t which expresses the time varying Poisson model. 
If fc = 0.5, the time varying degree of 9t becomes maximum 
since the variance of Ut takes the maximum value. Thus 
the proposed model defined in (l)-(5) includes a classical 
stationary Poisson distribution as a special case if A: = 1. 
Moreover, k plays another important role which would be 
explained in Remarks 2.2. 

2.2 Updating Density Parameter 9t 

As shown in Figure 1 , the parameter updating rule from 9t 
to 9t+i consists of Bayes theorem and time variation. In the 
followings, this subsection 2.2 is divided into three parts. The 
first 2.2.1 describes the updating rule by Bayes theorem for 
t > 2, the second 2.2.2 describes the initial condition of the 



prior distribution p (^i) for t = 1, and the last 2.2.3 describes 
general time variation rule which is mainly discussed in [8]. 

2.2.1 The Posterior Distribution of Density Parameter 

Suppose that sequence x\^^ is already observed where t > 
2 and the prior distribution of parameter p (^t | at, f3t,x]~^) 
is defined as the Gamma distribution in (3) for t > 2. If new 
xi is observed, the posterior distribution of p [O), | x^) by the 
Bayes theorem is obtained as the following: 



p{Ot 





Ot)p{Ot 








Ot)pi0t 







-exp[ 



t(A + l)](^t) 



(at+a!t)-l 



(6) 

(7) 



where t > 2. 

(7) means that the posterior distribution p {Ot \ x\) is also 
Gamma with parameter {at + xt) and + 1) . 

2.2.2 Initial Condition of Prior Distribution of Density 
Parameter 

For the initial prior distribution of p{Oi), this paper 
assumes no anomalies for WWW traffic. In Bayesian con- 
text, non-informative prior [2] [3] can be considered as the 
following: 

p(^i)cx^. (8) 
PI 

The above distribution can be formulated by the Gamma 
distribution with parameters at > 0,/3i = 1 in (3). 
Therefore, the general posterior updating form in (7) can 
also be applied for t = 1. Thus (7) holds for any t > 1. 
This means that the posterior distribution p (9t | xl'^ can 
be simply calculated for any i > 1 by considering two 
parameters at,(3t on the Gamma distribution if the initial 
prior distribution of (8) is assumed. This is the nature of 
conjugate prior [2] [3] between the Poisson and Gamma 
distributions. This nature contributes drastic reduction of 
calculation complexity under large t. 



2.2.3 Time Variation of Density Parameter 

To obtain p [9t+i \ x\),a time variation of density param- 
eter defined in (2) is used. This is actually a transformation 
of random variables among Ut,9t, and constant k. Even if 
a transformation is newly defined after Bayes theorem, the 
distribution family of density parameter remains same as 
that of the conjugate prior distribution under certain class of 
transformations. Such class has been discussed in [8] as the 
following. 

Theorem 1 (Simple Power Steady Model [8]): Suppose a 
parameter distribution p{6t) is in the linear expanding family. 
Let T be a transformation function of parameter 0t. If T sat- 
isfies the following condition, then the parameter distribution 



remains same family of distribution not depending on t and 
the forecasting model is called Simple Power Steady Model 
(S.PS.M.) with respect to T: 



T{x) = 4'a;^ ^' > 0, < fc < 1. 



(9) 



If T satisfies (9) and is one-to-one mapping, the model can 
also be S.P.S.M.[8]. However, it has not yet been discussed 
about any definite transformation function which is required 
to obtain the forecasting estimator. In fact, the time varying 
parameter model defined in (3) is one-to-one mapping since 
Jacobian of (3) is easily proved to be non-zero. Thus the 
model in this paper is actually included in S.P.S.M. The 
forecasting estimator would be derived in the next subsection 
2.3. 

Under S.P.S.M. in this paper, the transformed distribution 

of 9t+i now becomes: 



P (Ot+i x{j 



■P 



t+i 

k 



(10) 



1)] 



k{at+xt) 



r [k {at + Xt)] 

X eM-Ot+ik (A + l)](0t+i)'^"'+"*^T^ (11) 

(10) and (11) mean that the transformed distribution of 
9t+\ becomes the Gamma distribution with the following 
parameters: 

at+i = k{at +xt) ; 

A+i=/e(A + l). ^ ^ 

If (12) is recursively applied with respect to t, the follow- 
ing equations are obtained: 

at+i = k*ai + J2Ui k^'^'^-'Xi ; 
Pt+i=k*Pi+EUk'-'. ^ ^ 

The above equations contribute drastic reduction of cal- 
culation complexity. 

2.3 Output Estimator xt+i 

The output of proposed model is an estimator xt+i as 
depicted in Figure 1. xt+i is a prediction of number of 
request arrivals at time {t + 1) under the input sequence 
x\ = X\X2 ■ ■ - Xt. Xt+i is formulated in terms of statistical 
decision theory [2] [3] and derived as the following. 

Let xt+i be an estimator of xt+i and define the following 
squared-error loss function to evaluate Xt+i- 

L{xt+i,xt+i)= {xt+i - xt+if. (14) 

Since Xt+i distributes with the Poisson defined in (1), risk 
function [2] [3] under the certain density parameter Ot+i 
becomes the following: 



R{xt+i,6t+i)=^ [L{xt+i,xt+i)p{xt+i \ Ot+i)] 

xt+i=0 



(15) 



=X] - ^t+-^f Pi^t+i\Ot+i)\ ■ (16) 

xt+i=0 



Next, the Bayes risk function [2] [3], which is obtained by 
taking expectation of the risk function with respect to density 
parameter 9t+u becomes the following: 



number of request arrivals was counted with every five- 
minutes-interval except for maintenance period. The detail 
specifications are described in Table 1. 



BR{xt+i) 

OO 

R{xt+i,0t+i)p{0t+i)d9t+i 



(17) 



V [xt+i - xt+if / P{xt+i\ Ot+i)p{Ot+i)det+i . (18) 
.,,=n Jo 



xt+i=0 



Finally, suppose an estimator x^^-^ to minimize the Bayes 
risk function defined in (18). Such estimator is called the 
Bayes optimal prediction [2] [3]. Under this paper's assump- 
tions, f is obtained as follows: 



= arg min BR {xt+i ) 



(19) 



= / P(^t+i I 0t+i,x\)p{0t+i\x\)d6t+i (20) 

= E [xt+i I x\] (21) 
= E [Ot+i I x\] (22) 



(23) 

(24) 



Note that (22) is obtained since xt+i is the Poisson distribu- 
tion defined in (1) and the expectation of xt+i corresponds 
to the that of density parameter 9t+i. (23) is obtained 
since 9t+i has the Gamma distribution defined in (3) and 
its expectation becomes at+i/Pt+i- (24) is obtained by 
applying (13) to (23). 

Remarks 2.2: In (24), x*^i is obtained by simple arith- 
metic calculation. This point can be effective not only 
theoretical point of view but also the real implementation 
such as server log analysis software tools. The second term 
of numerator in (24) has a form of Exponentially Weighted 
Moving Average [8] with a time varying constant k. As 
k becomes larger in (24), the weighting of past observed 
sequence x\ increases. This means that k can be considered 
as a parameter of long-range dependence (LRD). If fc = 1, 
its weighting becomes maximum and the model is regarded 
as stationary model. The effect of k under the real WWW 
traffic data would be considered in the next section. 

3. Analysis Examples of WWW Traffic 
Data 

3.1 WWW Traffic Data 

The real WWW traffic data was derived from access logs 
of two different WWW servers (A, and B) on campus. The 



Table 1: Specifications of real WWW traffic data. 





Server A 


Server B 


Request Arrivals 


154,932 


274,302 


Start Date 


Mar. 18, 2005 


Mar. 18, 2006 


End Date 


Apr. 9, 2005 


Apr 9, 2006 


Time Length 


13d 12h 10m 


13d Ih 10m 


Time Intervals 


3,890 


3,758 



3.2 Maximum Likelihood Estimation for k 

Properties described in the previous section hold on con- 
dition that a time varying constant k in (2) is known. If 
real data is dealt with, however, k is unknown and should 
be estimated. Taking the maximum likelihood estimation 
of k, the objective hkehhood function L{k) becomes the 
following: 



m 

t 

= p{xi I 6'i) 



„i-l 



,k) 



i=2 

t 



P{xi \ p{x,\x\-]0i)p{0i\x\-^)d0. 

p{xi I 6*1) 

(A)"'r(a,+.T,) 



(25) 
(26) 



n 



(A + l)"'+"-r(ai)xi! 



Pi 



(27) 



Note that ai,Pi in (27), the previously obtained results 
in (13) were appUed. Therefore, the maximum likelihood 
estimator (MLE) of k is obtained as follows: 



k= arg max L (fc) . 

k 



(28) 



On the above likelihood function, the analytical maximum 
hkehhood estimation for k is quite difficult, however, the 
solution can be obtained by numerical calculation. The 
interval < fc < 1 is divided into 1,000 sub-intervals and 
value of log L(k) is calculated for each k numerically. Some 
plots of function logL(A;) are shown in Figure 2. Some 
examples of MLE for k are also shown in Table 2. 

3.3 Point Estimation for WWW Traffic Fore- 
casting 

The real WWW data described on Table 1 was processed 
to evaluate the point estimates of future request arrivals on 
the proposed model. For the performance comparison, the 
point estimates on the classical stationary Poisson model 
were also calculated. On the proposed model, each MLE 



5.5x10^ 
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Fig. 2: Examples of log-likelihood functions. Left top is in 
Mar. 23, 2005 on server A; Right top is in Mar. 31, 2005 
on server A; Left bottom is in Mar. 27, 2006 on server B; 
Right bottom is in Apr. 08, 2006 on server B. 



Table 


3; Mean 


square 


d error 


on sei'vei" A. 


Server A 


Proposed Model 


k 


Stationary Model 


Mar. 19 


2.829 X 


102 


0.716 


4.988 X 102 


Mar. 20 


2.657 X 


102 


0.762 


3 297 X 102 


Mar. 21 


3.816 X 


102 


0.753 


A 802 X 102 


Mar. 22 


7.111 X 


102 


0.805 


Q ,t;,s4 X 102 


Mar. 23 


8.202 X 


102 


0.759 


1 335 X 10^ 


Mar. 24 


1.356 X 


103 


0.804 


3 458 X 1 0^ 


Mar. 25 


9.523 X 


102 


0.804 


2.062 X 10^ 


Mar. 26 


4.811 X 


102 


0.783 


6 479 X 1 02 


Mar 27 


8.596 X 


102 


0.771 


1.239 X 10^ 


Mar. 28 


1.980 X 


10-^ 


0.754 


4.041 X 10^ 


Mar. 29 


1.657 X 


10^ 


0.777 


4.019 X 10^ 


Mar. 30 


4.940 X 


102 


0.788 


7 568 X 1 02 


Mar. 31 


8.088 X 


102 


0.787 


1 218x1 n-^ 


Apr 01 


8.967 X 


102 


0.775 


1.887 X 10'^ 


Apr 02 


1.258 X 


IQi 


0.753 


1.220 X IQi 


Apr 03 


4.184 X 


lOO 


0.826 


4.375 X 10" 


Apr 04 


4.206 X 


IQi 


0.914 


4.317 X IQi 


Apr 05 


4.095 X 


IQi 


0.666 


3.808 X W 


Apr 06 


3.612 X 


IQi 


0.710 


4.723 X IQi 


Apr 07 


2.813 X 


102 


0.661 


5.183 X 102 


Apr 08 


2.295 X 


lOi 


0.786 


2.325 X lOi 


Apr 09 


7.589 X 


10° 


0.803 


8.127 X 10° 



Table 2: Maximum likelihood estimates for k. 
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Name 


Date 


MLE of k 


Server A 


Mar 23, 2005 


0.804 


Server A 


Mar 31, 2005 


0.775 


Server B 


Mar. 27, 2006 


0.857 


Server B 


Apr 08, 2006 


0.764 



of k was calculated from the previous day's log. To eval- 
uate performance on both models, the mean squared error 
between each point estimate and observed value of request 
arrivals was calculated. 

Table 3 and 4 show mean squared error of proposed 
and stationary models on server A and B, respectively. In 
each table, the MLEs of k from the previous day's logs are 
showed on the third row. Figure 3 shows point and interval 
estimates v.s. observed values plot of server A on Mar 25, 
2005 where k = 0.804. In Figure 3, the vertical axis is the 
number of request arrivals and the horizontal axis is time 
interval index. The solid line, solid chain line, and histogram 
represent the point estimates on the proposed model, those 
on the classical stationary Poisson model, and observed 
values of request arrivals, respectively. Dotted hues are 95% 
interval estimates of proposed and stationary models. Figure 
4 also shows point and interval estimates v.s. observed values 
plot of server B on Mar. 28, 2006 where k = 0.857. 

3.4 Interval Estimation for WWW Traffic 
Forecasting 

Table 5 shows interval estimation example of server A 
on Mar. 25, 2005. In Table 5, <= 104 is taken since it 
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Fig. 3: Point and interval estimates v.s. observed values plot 
of server A on Mar. 25, 2005 (fc = 0.804). 



gives max observed value a;io4 = 210 as the numbers of 
request arrivals. The second, third, and forth rows show the 
expected value, 95% confidence Umit, and 99% confidence 
limit, respectively on the proposed and stationary models. 
Table 6 also shows interval estimation result of server B on 
Mar. 28, 2006 where f = 86 is taken. 

4. Discussion 

4.1 Maximum Likelihood Estimation for k 

According to Figure 2, it is showed that there exist 
some cases where their likelihood functions of logL(fc) are 
convex. Actually, all likelihood functions were convex to the 



Tabic 4 


: Mean 


square 


d error on server B. 


Server B 


Proposed Model 


k 


Stationary Model 


Mar. 19 


3.070 X 


102 


0.777 


3.874 X 10 


Mar. 20 


1.302 X 


10^ 


0.820 


2.386 X 10^ 


Mar. 21 


5.159 X 


102 


0.782 


5.896 X 10^ 


Mar. 22 


1.244 X 


10^ 


0.832 


1.650 X 10^ 


Mar. 23 


1.741 X 


10^ 


0.827 


4.151 X 10^ 


Mar. 24 


1.109 X 


10^ 


0.836 


1.980 X 10^ 


Mar. 25 


4.705 X 


10^ 


0.812 


5.436 X 102 


Mar. 26 


1.504 X 


10^ 


0.755 


1.915 X 10^ 


Mar. 27 


2.964 X 


10^ 


0.800 


7.918 X 10^ 


Mar. 28 


1.109 X 


IQi 


0.857 


1.932 X 10^ 


Mar. 29 


2.019 X 


10« 


0.805 


5.576 X 10^ 


Mar '^0 


4.370 X 


10^ 


0.784 


6.708 X 102 


Mar. 31 


5.318 X 


10^ 


0.799 


6.750 X 102 


Apr 01 


2.568 X 


102 


0.710 


3.327 X 102 


Apr 02 


4.739 X 


10^ 


0.640 


5.118 X 102 


Apr. 03 


6.019 X 


102 


0.745 


8.237 X 102 


Apr. 04 


1.544 X 


103 


0.791 


5.448 X 10^ 


Apr 05 


1.449 X 


102 


0.820 


1.709 X 102 


Apr 06 


4.712 X 


102 


0.791 


5.620 X 102 


Apr 07 


2.784 X 


102 


0.777 


3.567 X 102 


Apr 08 


3.200 X 


10^ 


0.764 


1.082 X lO* 


Apr. 09 


1.333 X 


10^ 


0.841 


3.128 X 103 



300 




Time Interval Index 

Fig. 4: Point and interval estimates v.s. observed values plot 
of server B on Mar. 28, 2006 (k = 0.857). 



best of numerical calculations in this paper. Figure 2 also 
shows that the absolute value of gradient in log L{k) around 
MLE of k becomes quickly larger as k increases beyond k. 
This fact shows that the under estimation for k causes less 
error than its over estimation. 

For the estimated value of k, the minimum was k = 0.640 
and the maximum was k = 0.914 as shown in Table 2, 
3, and 4. Most of k distributes between 0.700 and 0.850. 
This fact suggests non-stationarity of the WWW traffic data. 
Table 7 shows Akaike Information Criterion (AlC) on server 
A. According to Table 7, most of AIC on the proposed 
model are smaller than those of stationary model to select 
the proposed model. Exception is that absolute values of AIC 



Table 5; Interval eslimalion of server A on Mar. 25, 2005. 
t = 104 on Server A Proposed Stationary 



£104 (Expected Value) 111 69 

X104 (95% Confidence Limit) 133 83 

X104 (99% Confidence Limit) 142 89 

X104 (Max Observed Value) 210 



Table 6: Interval estimation of server B on Mar. 28, 2006. 

t = 86 on Server B Proposed Stationary 



X86 (Expected Value) 148 126 

X86 (95% Confidence Limit) 170 145 

X86 (99% Confidence Limit) 180 153 

X86 (Max Observed Value) 289 



on the two models suddenly become smaller around Apr. 01, 
2005 on server A. This is actually because the average traffic 
on server A drastically decreased to less than 1,000 request 
arrivals per a day. In this period, the traffic in each time 
interval stayed at lower level and could be regarded as the 
stationary Poisson model. For server B, on the other hand, 
such traffic decrease did not occur and the proposed model 
is chosen for all days by AIC model selection. As a whole, 
it can be concluded that the proposed model has stronger 
validity for real WWW traffic data than the stationary model 
from the viewpoint of model selection. 

4.2 Point Estimation for WWW Traffic Fore- 
casting 

For point estimation of future request arrivals. Table 
3, and 4 show that the proposed model has the better 
performance than that of the stationary model in terms of 
mean squared error. Figure 3 and 4 also depict that the point 
estimates on the proposed model are following more closely 
to the observed values than those on stationary model. As 
mentioned in Remarks 2.1, the proposed model contains 
the stationary model as a special case when k = 1.000. 
Therefore, regardless of its stationarity or non-stationarity of 
WWW traffic, the proposed model can be applied to traffic 
forecasting and would help for planning of WWW servers 
to some extent. 

However, it should be noted that this result strongly 
depends on the accuracy of MLE of k. In Table 3, each 
MLE of k during days in April often differs from k = 
1.000 in spite of its strong stationarity on the real traffic. 
In such situation, the mean squared error on the proposed 
model becomes larger than that of stationary model. Another 
example is that if A; = 0.300 on the proposed model with 
server A, the mean squared error of the proposed model 
becomes 5.41 x 10"^ where that of stationary model becomes 
3.46 X lO'^. Figure 5 depicts this poor performance of the 
proposed model. Figure 6 is a plot of mean squared errors 
v.s. k. In interval of fc < 0.600, the extremely smaller 



estimate of k could cause larger mean squared error. The 
under estimation near the MSE minimizer of k, however, 
causes relatively smaller error than its over estimation as 
previously described in subsection 4.1. 

In Figure 6, mean squared errors takes minimum values 
around k = 0.800. In fact, Table 3 and 4 show that 
corresponding maximum likelihood estimates for k are k = 
0.783 for server A and k = 0.805 for server B, respectively. 
This result suggests that the data length of previous day's log 
at the maximum hkehhood estimation for k was sufficient. 

4.3 Interval Estimation for WWW Traffic 
Forecasting 

For interval estimation, time interval indices that give the 
maximum number of request arrivals are taken on Table 5 
and 6. This is because one of administrators' concerns can 
be the maximum number of the request arrivals in terms of 
stable server operations. As a result, each confidence limit 
of Xt derives larger value than that of point estimate of Xt 
(=expected value) and reduces the mean squared error than 
that of point estimate. This effect on the proposed model 
would be stronger than that on the stationary model, since the 
performance of point estimates of xt on the proposed model 
is superior to that of stationary model. Thus the advantage 
of Bayesian approach was observed. 

Table 7: Akaike Infonnation Criterion (AlC) on server A. 



Table 8: Akaike Information Criterion (AIC) on server B. 



AIC on Server A 



Proposed 



Stationary 



Mar. 


19, 


2005 


-3 


562 


X 


10^ 


-3.367 X 103 


Mar. 


20, 


2005 


-3 


102 


X 


103 


-3.011 X 103 


Mar. 


21, 


2005 


-5 


085 


X 


103 


-4.960 X 103 


Mar. 


22, 


2005 


-8 


316 


X 


103 


-8.108 X 103 


Mar. 


23, 


2005 


-1 


052 


X 


10* 


-1.018 X 10* 


Mar. 


24, 


2005 


-1 


797 


X 


10* 


-1.715 X 10* 


Mar. 


25, 


2005 


-1 


088 


X 


10* 


-1.031 X 10* 


Mar. 


26, 


2005 


-3 


917 


X 


103 


-4.649 X 103 


Mar. 


27, 


2005 


-9 


023 


X 


103 


-8.723 X 103 


Mar. 


28, 


2005 


-1 


875 


X 


10* 


-1.795 X 10* 


Mar. 


29, 


2005 


-1 


869 


X 


10^ 


-1.774 X 10"' 


Mar. 


30, 


2005 


-5 


825 


X 


103 


-5.610 X 103 


Mar. 


31, 


2005 


-1 


003 


X 


10-* 


-9.784 X 10^ 


Apr. 


01, 


2005 


-6 


456 


X 


10^^ 


-5.783 X 10^ 


Apr. 


02, 


2005 


-1 


538 


X 


IQi 


-2.313 X IQi 


Apr. 


03, 


2005 


+1 


763 


X 


IQi 


+8.860 X 10" 


Apr. 


04, 


2005 


-1 


655 


X 


10^ 


-1.615 X 10^ 


Apr. 


05, 


2005 


-1 


821 


X 


10^ 


-1.803 X 10^ 


Apr. 


06, 


2005 


-1 


771 


X 


10^ 


-1.516 X 10^ 


Apr. 


07, 


2005 


-2 


826 


X 


10^ 


-2.675 X 10^ 


Apr. 


08, 


2005 


-1 


170 


X 


10^ 


-1.184 X 10^ 


Apr. 


09, 


2005 


+6 


273 


X 


10° 


+4.767 X 10-1 



5. Conclusion 

This paper showed Bayesian forecasting of WWW traffic 
on the time varying Poisson model. This model is obtained 
by defining a random-walk type of time varying parameter 



AIC on Server B 



Proposed 



Stationary 



Mar. 


19, 


2006 


-3 


270 


X 


10^ 


-3 


195 


X 


10^ 


Mar. 


20, 


2006 


-1 


244 


X 


10^ 


-1 


201 


X 


10* 


Mar. 


21, 


2006 


-1 


245 


X 


lO* 


-1 


201 


X 


10* 


Mar. 


22, 


2006 


-1 


652 


X 


10"* 


-1 


6,34 


X 


10* 


Mar. 


23, 


2006 


-2 


231 


X 


10"* 


-2 


165 


X 


10* 


Mar. 


24, 


2006 


-1 


217 


X 


10"* 


-1 


180 


X 


10* 


Mar. 


25, 


2006 


-4 


482 


X 


10^ 


-4 


395 


X 


10-^ 


Mar. 


26, 


2006 


+1 


504 


X 


103 


+1 


915 


X 


103 


Mar. 


27, 


2006 


-3 


504 


X 


10* 


-3 


403 


X 


10* 


Mar. 


28, 


2006 


-2 


130 


X 


lO'' 


-2 


086 


X 


10* 


Mar. 


29, 


2006 


— 1 


958 


X 


10* 


-1 


852 


X 


10* 


Mar. 


30, 


2006 


-4 


601 


X 


103 


-4 


428 


X 


103 


Mar. 


31, 


2006 


-3 


417 


X 


103 


-3 


282 


X 


103 


Apr. 


01, 


2006 


-8 


741 


X 


102 


-8 


145 


X 


10^ 


Apr. 


02, 


2006 


-4 


393 


X 


103 


-4 


308 


X 


103 


Apr. 


03, 


2006 


-7 


633 


X 


103 


-7 


462 


X 


103 


Apr. 


04, 


2006 


-1 


837 


X 


10* 


-1 


710 


X 


10* 


Apr. 


05, 


2006 


-1 


033 


X 


103 


-9 


828 


X 


102 


Apr. 


06, 


2006 


-3 


311 


X 


103 


-3 


183 


X 


103 


Apr. 


07, 


2006 


-1 


958 


X 


103 


-1 


838 


X 


10-' 


Apr. 


08, 


2006 


-3 


384 


X 


10* 


-3 


211 


X 


10* 


Apr. 


09, 


2006 


-1 


090 


X 


10* 


-1 


010 


X 


10* 
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Fig. 5: Point and interval estimates vs. observed values plot 
of server A on Mar. 25, 2005 (k = 0.300). 



function on Simple Power Steady Model. The forecasting 
estimator of this model guarantees the Bayes optimality 
in terms of statistical decision theory and is calculated by 
simple arithmetic calculation. The latter point especially can 
be effective for the real implementation such as server log 
analysis software tools. 

Furthermore, the non-stationarity is expressed by a time 
varying degree constant k in the model. This paper pointed 
out that the constant k can be considered as a parameter 
of long-range dependent (LRD) for real traffic data and the 
model includes stationary Poisson model as a special case 
if fc = 1. 



1.4x10* 



1.2x10* 



Server A (Expected) 

Server A (95%) 

Server A (99%) - ■ - 
Server B (Expected) 

Server B (95%) 

Server B (99% 




[7] Antoine Sclierrer, Nicolas Larrieu, Ptiilippe Owezarski, Pierre Borgnat, 
Patrice Abry, "Non-Gaussian and Long Memory Statistical Char- 
acterizations for Internet Traffic with Anomalies," IEEE Trans, on 
Dependable and Secure Computing, vol.4, no. 1, pp. 56-70, 2007. 
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Fig. 6: Mean squared error v.s. A; plot of server A on Mar. 
25, 2005 and server B on Mar. 28, 2006. 



For evaluation of Bayesian approach, the real WWW 
traffic data is appUed to the model in this paper. The 
maximum likelihood estimation method of k from real traffic 
data is also discussed and its performance is proved to be 
sufficient for WWW traffic forecasting. 

According to its result, the proposed model has stronger 
vahdity than classical stationary Poisson model in terms of 
model selection. Furthermore, under the estimated value of 
k, the point and interval estimates on the proposed model 
showed smaller mean squared error comparing to those 
on the stationary model for the traffic forecasting. Thus 
the advantage of the proposed model is shown from both 
theoretical and empirical points of view. 
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